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Abstract 

We give a deterministic, polynomial-time algorithm for approximately counting the number 
of {0, l}-solutions to any instance of the knapsack problem. On an instance of length n with 
total weight W and accuracy parameter e, our algorithm produces a (1 + e)-multiplicative 
approximation in time poly (n, log W, 1/e). We also give algorithms with identical guarantees 
for general integer knapsack, the multidimensional knapsack problem (with a constant number 
of constraints) and for contingency tables (with a constant number of rows). Previously, only 
randomized approximation schemes were known for these problems due to work by Morris and 
Sinclair and work by Dyer. 

Our algorithms work by constructing small- width, read-once branching programs for approx- 
imating the underlying solution space under a carefully chosen distribution. As a byproduct 
of this approach, we obtain new query algorithms for learning functions of k halfspaces with 
respect to the uniform distribution on {0, 1}™. The running time of our algorithm is polynomial 
in the accuracy parameter e. Previously even for the case of k = 2, only algorithms with an 
exponential dependence on e were known. 
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1 Introduction 



In this paper we give the first deterministic, polynomial-time approximation schemes for several 
well-studied #P-hard counting problems such as knapsack and multidimensional knapsack. There 
are many celebrated, randomized polynomial-time algorithms for approximately counting #P-hard 
problems (for example, approximating the permanent [JSV04]). There are far fewer examples, 
however, of deterministic approximation algorithms for #P-hard problems. A few notable examples 
can be found in [Wei06], [BGK+07], [HKM+09]. 

The knapsack counting problem (#KNAP) is defined as follows: given a non-negative vector 
a € and non-negative b G Z + , count the size of the set KNAP(a, b) = {x € {0, l} n : a\X\ < 
b}. It is well-known that the ^KNAP problem is ^P-hard, and much attention has been given 
to the problem of approximately counting the size of KNAP(a,6). More specifically, given an 
error parameter e, we are interested in finding a value p such that |KNAP(a, b)\ < p < (1 + 
e)|KNAP(a, b)\ in time polynomial in n and 1/e (such a value p is often referred to as an e relative- 
error approximation or ^-approximation for short) . 

Dyer et al. [DFK + 93] were the first to study the problem of approximately solving #KNAP 
and gave a sub-exponential time algorithm for the problem. Morris and Sinclair [MS04] were the 
first to give a polynomial-time, randomized approximation scheme (FPRAS) for #KNAP, they use 
a rapidly mixing Markov chain to sample randomly from the solution space. Subsequently, Dyer 
[Dye03] gave a simpler FPRAS based on dynamic programming for ^KNAP. In this work, we give 
the first deterministic polynomial-time approximate counting algorithm for ^KNAP 1 : 

Theorem 1.1 (determnistic counting for knapsack). Given a knapsack instance (a,b) E Z™ xl with 
weight W = a « + & an< ^ £ > 0, there is a deterministic 0(n 3 log(VF) log(n/e)/e) time algorithm 
that computes an e-relative error approximation for |KNAP(a, b)\. 

Our algorithm is simple and yields a fast method for generating a uniformly random element 
of KNAP(a, b). The algorithm is inspired by a recent work due to Meka and Zuckerman [MZ10] on 
monotone branching programs in the context of building pseudorandom generators for halfspaces. 
Further, we show how to extend our algorithm to work with respect to a broad class of natural 
non-uniform distributions on {0, l} n including all symmetric and product distributions. To the 
best of our knowledge, no efficient algorithms (randomized or otherwise) for counting with respect 
to these natural distributions were known previously (see Section 1.1 for more details). 

Morris and Sinclair [MS04] and Dyer [Dye03] also gave an FPRAS for counting the number 
of solutions to the multidimensional knapsack problem for a constant number of constraints. In 
this problem, we are given k knapsack instances (a\,bi), . . . , (ak,bk) £ Z™ xl , e > 0, and the goal 
is to compute the number of solutions satisfying all constraints; i.e., compute |KNAP(ai,&i) n 
KNAP(a2, 62) n • • • H KNAP(afc, 6fc)|. We obtain a deterministic algorithm for this problem that runs 
in polynomial time for k = O(l): 

Theorem 1.2 (multi-dimensional knapsack). Given knapsack instances KNAP(ai, 61), . . . , KNAP(a£, bk) 
of total weight at most W, there is a deterministic 0((n/e)°^ k ) • log W) algorithm that computes 
an e-relative error approximation to the number of solutions satisfying all the knapsack constraints. 

Our solution has two components: we generalize our counting algorithm for a single knapsack 
constraint to work with respect to non-uniform distributions representable by small-width branch- 
ing programs. We then use Dyer's elegant rounding results to reduce multidimensional knapsack 
counting to counting a single knapsack under such distributions. 

throughout this paper we assume integer addition to be of unit cost. 
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Our techniques also apply to the problem of counting the solutions of integer-valued knap- 
sack instances. Here the goal is to estimate the size of the set of solutions KNAP(a,6, u) = {x : 
Yli<n aiXi — ^' — Xi — Not* that the range sizes u\, . . . ,u n could be exponential in n. Dyer 
[Dye03] gave an FPRAS for the integer-valued case as well. We obtain a FPTAS for the problem. 

Theorem 1.3 (integer knapsack). Given a knapsack instance KNAP(a, b, u) with weight W = 
J2 i aiUi + b,U = maxj Uj and e > 0, there is a deterministic 0(n 5 (log C/) 2 (log W)/e 2 ) algorithm 
that computes an e-relative error approximation for |KNAP(a, b, u)\. 

We also obtain similar results for counting the number of integer- valued contingency tables. 
Given row sums r = (n, . . . , r m ) G Z™ and column sums c = (ci, . . . , c n ) G Z™ , let CT(r, c) C Z™ xn 
denote the set of integer- valued contingency tables with row and column sums given by r, c: 

CT(r, c) = { X G Z™ xn : X i$ = r h i G [m], ^ = Cj , j G [n] }. 

Note that as in the case of knapsack, the magnitude of the row and column sums could be 
exponential in n. Dyer [Dye03] gave an FPRAS for counting solutions to contingency tables (with 
a constant number of rows) based on dynamic programming. We give a FPTAS for this problem: 

Theorem 1.4 (contingency tables with few rows). Given row sums r = (ri,...,r m ) G ZT 1 
and column sums c — (ci, . . . , Cyi) G Z^ with R — max^ ri and £ ^> ? there is a deterministic 
(n°( m '(log R)/e) m algorithm that computes a e-relative error approximation for \CT(r,c)\. 

All our counting results also give fast sampling algorithms. After a pre-processing phase, each 
new random sample can be generated in near-linear time, which improves considerably on previous 
sampling algorithms. 

Finally, we can use ideas motivated by our algorithm for counting knapsack solutions to learn 
functions of halfspaces with membership queries with respect to the uniform distribution on {0, l} n : 

Theorem 1.5. The concept class of arbitrary Boolean functions ofk halfspaces can be PAG learned 
with membership queries under the uniform distribution {0, l} n to accuracy e in time (n/e)°( k \ 

Previous algorithms [KOS04] ran in time rP^I^ (without queries) or in time poly(n 2<: , W 2 * , 1 /e) 
(with queries) where W is a bound on the weight of all halfspaces (which could be exponential in 
n). Thus, even for the special case of learning the intersection of two halfspaces, known algorithms 
had either an exponential dependence on 1/e or a polynomial dependence on W. Our algorithm is 
similar to Angluin's algorithm for learning finite automata [Ang87] (essentially we reconstruct the 
underlying approximating branching program). The analysis however, is quite different, since we 
learn functions that are not (exactly) computable by small-width ROBPs. 

1.1 Outline of our Algorithms. 

Approximation by Branching Programs. All of our results revolve around the ability of read- 
once branching programs (ROBPs) to approximate various classes of Boolean functions. Informally, 
ROBP of width W is a labeled, layered directed graph with at most W vertices per layer that 
induces a Boolean function in the obvious manner: at layer i we read the i'th bit of input, follow 
the appropriate transition, and output the label of the final vertex reached (see Definition 2.1). 

It is easy to see that a knapsack constraint of weight W (recall W may be exponential in n) can 
be computed exactly by a width- W ROBP which keeps track of partial sums. Meka and Zuckerman 
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[MZ10] in their work on pseudorandom generators for halfspaces 2 proved the existence of a small- 
width ROBP that approximates the solutions to a single knapsack constraint with small additive 
(as opposed to multiplicative) error. To give an algorithm for approximately counting knapsack 
solutions, we show how to explicitly construct a small-width ROBP whose set of accepting strings 
is a multiplicative approximation to the set of strings satisfying the knapsack constraint. Our 
construction proceeds by sparsifying each layer in the exact ROBP for knapsack by retaining only 
a few carefully chosen representative partial sums. This uses the insight from [MZ10] that in the 
exact branching program for halfspaces, there is a natural ordering on the vertices in each layer 
induced by the partial sums. 

Building on these ideas, we give a query algorithm that can learn an unknown Boolean function 
under the uniform distribution as long as it is approximated in a certain sense by a small-width 
ROBP. Previous learning algorithms (e.g., Angluin's algorithm for learning finite automata) re- 
quired the function to be exactly computable by a small-width ROBP. Our notion of approxima- 
tion is somewhat subtle: it is stronger than being approximated by an ROBP under the uniform 
distribution, but weak enough that any function of few halfspaces has such approximations. 

Small-Space Sources. Extending our knapsack algorithm to multiple knapsack constraints 
is not immediate. The main obstacle is that the natural ROBP for the intersection of knapsack 
constraints which keeps track of all partial sums does not have a total ordering on its vertices, and 
our knapsack algorithm crucially uses such a total ordering. One can still construct a small width 
ROBP that additively approximates the number of solutions to multidimensional knapsack (as in 
[GOWZ10]), but even the existence of a small-width multiplicative approximation is unclear. 

To circumvent this issue, we first generalize our algorithms to counting knapsack solutions with 
respect to small-space sources which were introduced by Kamp et al. [KRVZ06] in the context 
of randomness extraction. Informally, these are families of distributions on {0, l} n that can be 
generated by small-width branching programs (see Section 2 for the formal definition). We show 
the following result for deterministically counting knapsack solutions under small space sources: 

Theorem 1.6 (counting under small-space sources). Fix a knapsack instance of total weight W 
and error parameter e > 0. Let [i be a distribution on {0, l} n with an explicitly given small space 
generator of width at most S and define /i(KNAP(a, b)) as Pt x ^u[x £ KNAP(a, b)]. Then there is a 
deterministic algorithm that runs in time 0(n s S(S + log W) \og{n/e)/e) and computes an e-relative 
error approximation to / u(KNAP(a, b)). 

Next, we use an elegant result of Dyer [Dye03], which given a instance of multidimensional 
knapsack, constructs a small space source under which the set of solutions is polynomially dense. 
It is easy to get an additive approximation for multidimensional knapsack using Theorem 1.6. 
Dyer's result lets us transform a low additive error guarantee into a multiplicative error guarantee 
and prove Theorem 1.2. 

Small-space distributions include several natural distributions such as all symmetric distribu- 
tions and product distributions. Thus as a corollary to Theorem 1.6, we obtain FPTASes for several 
interesting variants of knapsack for which no polynomial time algorithms - even randomized - were 
known to the best of our knowledge. For instance we show: 

Corollary 1.7. Given a knapsack instance (a, b) & Z™ xl of total weight W = a% + b, e > and 

r € [n] we can in deterministic time O (n 3 r(r + log W)/e) compute an e-relative error approximation 
for the number of solutions to the knapsack instance of Hamming weight exactly r. 

2 Halfspaces are equivalent to the characteristic functions of knapsack instances. 
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1.2 Related Work 



Very recently, Stefankovic, Vempala, and Vigoda [SVV10] gave a deterministic FPTAS for the 
knapsack problem with a run-time of 0(n 3 e~ 1 log(n/e)). Their algorithm is based on dynamic 
programming. Our work was obtained independently of theirs. 

As mentioned earlier, Morris and Sinclair [MS04] and Dyer [Dye03] were the first to give an 
FPRAS for the knapsack problem, with Dyer's more efficient algorithm taking time O (?i 2 ' 5 \J\og(n js) + 
n 2 e~ 2 ). Dyer also gives a deterministic, polynomial-time algorithm that achieves a \fn factor ap- 
proximation for counting knapsack solutions. 

The problem of approximately counting knapsack solutions is equivalent to the problem of 
approximately counting the fraction of assignments that satisfy a linear threshold function or half- 
space. Servedio [Ser07] gave a deterministic algorithm for solving the latter problem to within an 
additive e in time exponential in 1/e 2 . Recently, Diakonikolas et al. [DGJ + 09] gave a pseudoran- 
dom generator for halfspaces with seed-length 0(logn/e 2 ) and Meka and Zuckerman [MZ10] gave 
a pseudorandom generator for halfspaces with seed- length O(lognlog(l/e)) (enumerating over all 
seeds results in a deterministic, additive error approximation). 

Many researchers in computational learning theory have studied the problem of learning func- 
tions computable by read-once branching programs (for a discussion see Bshouty et al. [BTW98]). 
Positive results were known only for restricted classes of ROBPs, such as width-2 ROBPs [EKR95, 
BTW98] (these algorithms use queries and succeed in the distribution-free model of learning) and 
do not apply in our setting. Our algorithms learn concept classes that are closely approximated by 
small-width ROBPs (with respect to the uniform distribution on {0, 1}™). 

2 Preliminaries 

2.1 Read-Once Branching Programs 

Definition 2.1 (ROBP). An (S,T) -branching program M is a layered multi-graph with a layer for 
each < i < T and at most S vertices (states) in each layer. The first layer has a single vertex 
vq and each vertex in the last layer is labeled with (rejecting) or 1 (accepting). For < i < T , a 
vertex v in layer i has two outgoing edges labeled 0, 1 and ending at vertices in layer i + 1. 

Note that by definition, an (S, T)-branching program is read-once. We also use the following 
notation. Let M be an (S, T)-branching program and v a vertex in layer i of M. 

1. For a string z, M{v,z) denotes the state reached by starting from v and following edges 
labeled with z. 

2. For z G {0, 1}™, let M{z) = 1 if M(vo, z) is an accepting state, and M(z) = otherwise. 

3. Am(v) = {z : M(v,z) is accepting in M} and Pm{v) is the probability that M(v,z) is an 
accepting state for z chosen uniformly at random. 

4. L(M, i) denotes the vertices in layer i of M. 

5. For a set U, x £ u U denotes a uniformly random element of U. 

2.2 Small-Space Sources 

Small-space sources were introduced by Kamp et al. [KRVZ06] in their work on randomness ex- 
tractors, as a generalization of many commonly studied distributions such as bit-fixing sources, 
Markov-chain sources. 
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Definition 2.2 (small-space sources, Kamp et al.). A width w small-space source is described by 
a (w,n) -branching program D with an additional probability distribution p v on the outgoing edges 
associated with vertices v € D. Samples from the source are generated by taking a random walk on 
D according to p v 's and outputting the labels of the edges traversed. 

We will often abuse notation and denote the distribution generated by a small-space source and 
the source itself by D. Also, we will assume that the distribution D is given to us explicitly as a 
small-space source. Several natural distributions such as all symmetric distributions and product 
distributions can be generated by a small-space source. The following claims are straightforward: 

Claim 2.3. Given a ROBP M of width at most W and a small-space source D of width at most 
S, Pr x ^rj[M(x) = 1] can be computed exactly via dynamic programming in time 0(n ■ S ■ W). 

Claim 2.4. Given a (W,n)-ROBP M, the uniform distribution over M's accepting inputs, {x : 
M(x) = 1} is a width W small-space source. 

3 A FPTAS for Counting Knapsack Solutions 

As described in the introduction, we construct a small-width branching program that approximates 
the feasible solutions to the given knapsack instance. We start with the exact branching program 
for knapsack which has width W, and where each state in layer j corresponds to a possible value of 
the partial sum Vj = ^i<j a i x i- We will approximate this program with a small width branching 
program whose state space is a carefully chosen subset of the original state space. We then count 
the number of accepting solutions to the constructed small-width branching program exactly via 
dynamic programming. 

Our goal is to partition the states in layer i into intervals I\ = \y\ = 0, . . . , vi — 1}, li = 
{v2, ■ ■ ■ , t>3 — 1}, It = {vt, • • • , vt+i = W} and have only one state for each interval. The intervals 
should be such that the number of accepting suffixes for all the partial sums in an interval is roughly 
the same. We then rearrange the incoming edges from layer i — 1 appropriately. We refer to this 
process as rounding layer i. A problem with this approach is that counting the number of suffixes 
which accept from a given partial sum is another instance of knapsack. 

We handle this by building the small width branching program backward starting from the last 
layer. When we round the layer i, the layers i + 1, . . . , n have already been rounded. Thus given a 
partial sum in layer i, we know the number of accepting suffixes in our branching program exactly 
and use these counts to partition layer i. We then show by induction that the resulting branching 
program gives a good approximation to the set of feasible knapsack solutions. 

We now give a formal description of this process. 

3.1 Constructing an Approximating Branching Program 

Let M denote the exact branching program for ^2 i<n aiXi < b, which consists of n + 1 layers 
numbered from to n. We denote the set of states in layer i by L(M, i). Layer has a single start 
state s. For i < n, L(M,i) has a state for each partial sum s >2j<i a j x j- Given a vertex v in layer 
i — 1 and X{ G {0, 1}, the Xj'th neighbor of v M(v, Xi) = v + a^Xi. 

We construct a series of branching programs M n = M^M 11-1 , . . . ,M°. We obtain M % from 
M l+1 by rounding the states in L(M l+1 , i + 1). More precisely, we set L(M l ,i + l) = {vx, . . . , v%\ C 
L(M i+1 ,i + 1) where the VjS are defined as follows: Let v\ = 0. Given Vj, let 

Vj + i = min v such that v > Vj and < P M i+i(v) < P M i+\{y.j) / {1 + e) (3-1) 
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Intuitively, state Vj represents the interval Ij = {vj, . . . , Vj + \ — 1}. When the acceptance probability 
drops by a factor of (1 + e), we start a new interval. Since P M i{v\) < 1 and P M i(vi-±) > 2~ n , we 
have t < 0(f )■ Next we redirect the edges from level i to level i + 1. If there is an edge labeled z 
entering a vertex v G Ij, then we redirect the edge to vertex vj. Note that rounding layer n to get 
M n_1 is trivial, we keep just one accept state and one reject state, corresponding to partial sums 
of and 6+1 respectively. 

Our branching programs have the following monotonicity property which is easily verified by 
induction. We omit the proof. 

Lemma 3.1. Let v,v' G L(M l ,j) and v < v' . For any suffix z, M l (v,z) < M l {v',z). Hence 
P Ml (v)>P Ml (v>). 

This property allows us to construct M l from M t+1 efficiently. The key idea is that in 
Equation A.l, due to the ordering of the probabilities Pm{), we can find Vj by binary search 
as opposed to sequential search, reducing the running time to 0(logW) as opposed to OiW). 

Lemma 3.2. Each vertex Vj G L{M l ,i + 1) can be computed in time 0(log(n/e) log W). 

Proof. The prove is by induction: we maintain the invariant that for every i, we have the vertices 
Vj of L(M l ,i + 1) stored in a binary tree and also know their acceptance probabilities P^i ( ). 

Suppose we have the above setup for I > i and have computed v\, . . . ,Vj G L(M % , i + 1). Recall 
that Vj + i is the smallest value of v > Vj satisfying P M i+i(v) < P M t+\{vj) / {1 + e). Given a vertex 
v G L(M i+1 ,i + 1), if v b = M i+1 {v,b) for b G {0,1}, then P M i+i(v) = (P M i + i(v ) + P M ,+i (fi))/2. 
So P M t+\{v) can be computed in time 0(log(n/e)) using the values of P M i+i(w) stored in a binary 
tree for w G L(m l+l ,i + 2). 

Lemma 3.1 shows that P m (v) decreases as v increases. So we can do binary search on P M i+i(v). 
Since v G {0, . . . , W}, this will require 0(log(W)) computations of P Ml +i(v). Once we have com- 
puted L(M l ,i + 1) we store these vertices and their probabilities of acceptance in a binary tree. □ 

Thus, we can construct M° from M in time 0(n 2 log(W)log(n/e)/e). We now address the 
approximation guarantee. We start by showing that the set of strings accepted grows as we proceed 
from M to M°. 

Lemma 3.3. For v G M\ we have A M i+i(v) C A M i(v). Thus P]\ji( v ) > Pm(v). 

Proof. Note that the claim is only interesting for v G L(M l ,j) where j < i, since for j > i + 1, 
both M i and Af' i+1 make identical transitions from v, and so A M i(v) = A M i+i(v). 

Let j = i. Let v^ = M l {y, b) for b G {0, 1}. Since M l is obtained from M l+l by rounding layer 
i + 1, there are vertices v' b = M 1+1 (v,b) for b G {0, 1} in L(M l+1 ,i + 1) such that v' b > v b , hence 
A M i+i (M l+1 (v, b)) = A M i+i(v' b ) C A M i+i(vb) = A M i(M l (v,b)). Thus the set of accepting suffixes 
can only increase for either choice of b, and the claim is proved. 

The claim for j < i follows since M % and M %+1 are identical up to layer i, and for every 
v G L(M\i) we have A M i+i(v) C A M i(v). □ 

Next we show that the set of accepting strings does not grow by too much. 

Lemma 3.4. For any vertex v G M l , we have P M i{v) < Pm{v)(1 + e) n ~ l . 

Proof. It is sufficient to show that for every i < n and v G M 4 , P M i(v) < P^+i (v)(l + e). Let 
v G L(M*,j). The above is trivial when j > i + 1, since A M i(v) = A Mi+ i(v) for such vertices v. 
Indeed, it suffices to consider the case when j = i, since for j < i, M l and M are identical up 
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to layer i. Hence we can express both P M i+i(v) and Pj^i(v) as the same convex combination of 
acceptance probabilities of vertices in layer i. 

Let j = i. Fix a vertex v G L(M l ,i). Let vj, = M l (v,b) for b G {0, 1} be the vertices reached 
in L(M l ,i + 1). Since M % is obtained from M* +1 by rounding the i th layer, there are vertices 
v' b = M i+1 (v, b) for b G {0, 1} in L(M i+1 ,i + 1) such that P M i+i (v b ) < (1 + e)P M i+i (v' b ). Thus 

PaAv) = liPu^vo) + P Mi (vi)) < ( i + e) P ^K) + ^K) = (1 + £) p Mi+l(u) . 

□ 

We can now finish the proof of Theorem 1.1. 

Proof of Theorem 1.1. We set e = fi(-) so that (1 + e) n < (1 + 5). Using Lemma 3.2, we can 
construct M° and compute P^-o(s) where s is the start state in the desired time bound. Applying 
Lemma 3.3 and Lemma 3.4 we get Pm(s) < Pj^o(s) < (1 + 5)Pm(s). The number of knapsack 
solutions is precisely 2 n PM(s). Hence we output 2 n P M o(s). □ 

We note that our algorithm also gives an efficient sampling scheme, since sampling from the set 
of accepting strings of a small-width branching program is easy. 

Theorem 3.5. There is a randomized algorithm which produces a uniformly random string from 
the set of solutions to a knapsack instance KNAP(a, b). The algorithm takes a processing time of 
0(n 3 log(W) log n) and then produces a uniformly random sample form the solution space in time 
0(n\og(l/rj)) with probability 1 — rj. 

Note that when the algorithm fails, it does not output a solution. Any solutions it outputs are 
guaranteed to be distributed uniformly. 

Proof. We set 5 = 0.1 and construct M° which requires time 0(n 3 log(W / ) log(ra)). It is easy to see 
that M l (v, z) < M(v, z) for any vertex v G M ! and i < n, hence Am(v) C A M i{v) and in particular 
A M (s) C A M o(s). By Lemma 3.3, \A M o(s)\ < l.l\A M (s)\. 

Further it is easy to sample from A^o(s) in time 0{n). Recall we have Pm°( v ) computed for 
each state v. We start at s. From a current vertex v, we move to Vj, = M°(v, b) for b G {0, 1} with 

probability p (Co)+p^o(^i) ' ^ ms P r0( i uces z ^m°( s )- We check that z is also a solution to 
the original knapsack in time 0(n), this happens with probability at least 0.8. By repeating this 
0(log(?7 _1 ) times, the failure probability becomes less than rj. □ 



4 Monotone ROBPs, Small-Space Sources, and Counting Solu- 
tions to Multidimensional Knapsack 

In this section, we consider more general models of computation and wider classes of distributions. 
We solve the approximate counting problem for the more general class of monotone read-once 
branching programs as defined in the work of Meka and Zuckerman [MZ10]. Further, we show how 
to deterministically approximate the acceptance probability under the natural and broader class of 
small-space sources introduced by Kamp et al. [KRVZ06]. 

Monotone ROBPs were introduced by Meka and Zuckerman [MZ10] in their work on pseudo- 
random generators for halfspaces. In addition to halfspaces, the class of monotone ROBPs includes 
read-once DNFs and read-once polynomial threshold functions (read-once PTFs). 
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Definition 4.1 (Monotone ROBP). A (W,n) -branching program, M is said to be monotone if 
for all % < n, there exists a total ordering -< on the vertices in L(M, i) such that if u -< v, then 
A M {u) C A m (v). 

It is easy to see that the branching program for knapsack satisfies the above condition. Given 
partial sums Vj , vt we say Vj -< Vk if Vj > vi~ , since a larger partial sum means that fewer suffix 
strings will be accepted. We say u H v if u -< v or u = v. 

Since we deal with monotone ROBPs that potentially have width exponential in n, we require 
that M is described implicitly in the following sense: 

1. Ordering: given two states u,v we can efficiently check if u -< v and if so find a w that is 
half-way between u,v, i.e., | \{x : u -< x ~< w}\ — \{x : w -< x -< v}\ \ < 1. 

2. Transitions: Given any vertex of M we can compute the two neighbors of the vertex. 

We assume that the above two operations are of unit cost. 

Our counting result for monotone ROBPs is obtained by proving the following structural result 
for monotone ROBPs that we believe is of independent interest: 

Theorem 4.2 (Main). Given a (W,n) -monotone ROBP M, 5 > 0, and a small-space distribution 
D over {0, 1}" of width at most S, there exists an {O (n 2 S/ 6), n) -monotone ROBP M° such that 
for all z, M(z) < M°(z) and 

Pr [M(z) = 1] < Pr [M°(z) = 1] < (1 + S) Pr [M(z) = 1]. 

x-(—D X*^D X*r~D 

Moreover, given an implicit description of M and an explicit description of D, M° can be con- 
structed in deterministic time 0(n 3 S(S + log(W)) log(n/5)/S). 

We prove Theorem 4.2 in Section 4.2. As discussed in the introduction, this theorem has 
many interesting consequences. We first derive these consequences before proving the theorem. 
Theorem 1.6 follows easily from the observation that a weight W halfspace is a (W, n)-monotone 
ROBP. Corollary 1.7 follows since the uniform distribution over strings of weight exactly r can be 
generated by a small space source of width at most r + 1. Further we can approximately count 
knapsack solutions with respect to all symmetric distributions, and all product distributions, since 
each of these can be generated by a small space source. 

4.1 A FPTAS for Multidimensional Knapsack 

Combining Theorem 4.2 work with a result due to Dyer [Dye03], we obtain a deterministic approx- 
imate counting algorithm for multi-dimensional knapsack with a constant number of constraints, 
matching Dyer's FPRAS up to polynomial factors. We use the following elegant rounding result 
due to Dyer: 

Theorem 4.3 (Dyer, [Dye03]). Given knapsack instances KNAP(ai, bi), . . . , KNAP(afc, b^), we can 
deterministically in time 0(n 3 (log W)) construct a new set of knapsack instances KNAP(a' 1 , b^), . . . , 
KNAP(a' fe , b' k ) each with a total weight of at most 0(n 3 ) such that KNAP(aj, hi) C KHkP{a' il b' i ), VI < 

i <k, and 

| Di KNAP(a^)| < (n + l) k \ f\ KNAP(a i5 bi) \. 
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Proof of Theorem 1.2. We first use Dyer's algorithm to obtain low-weight knapsack instances 
KNAP(a' 1 , b[), • • • , KNAP(a' fe , b' k ) as in Theorem 4.3. Let D be the uniform distribution over the set 
U = rijKNAP(a^, b'A and observe that by Corollary 2.4 D can be generated by an explicit 0(n 3k ) 
space source. For i G [k], let M l be a (W, n)-ROBP exactly computing the indicator function for 
KNAP(a.j, bi). Let 5 = 0{e/k{n + l) k ) to be chosen later. Now, for every i G [k], by Theorem 4.2 
we can explicitly in time n°( k \log W)/5 construct a (n°^/5, n)-ROBP M* p such that, 

Pr [AC p (x)/M J (x)]<5. 

Let M be the (n°^ k2 ^ / 5 k , n)-ROBP computing the intersection of M % for i G [fc], i.e., M(x) = 
AiM^ p (x). Then, by a union bound, 

Pr [M(x) / AiATfs)] < JW. 

On the other hand, by Theorem 4.3, 

Pr [AjMVx) = 1] > l/(n + 

Therefore, from the above two equations and setting <5 = e/2k{n + l) fc , we get that 
Pr [M (x) = 1] < Pr [AiAT(x) = 1] < (1 +e) Pr [M(x) = 1]. 

Thus, p = P r xg I1 {o,i} n [x £ U]- Pr xi ^r)[M(x) = 1] is an e-relative error approximation to the fraction 
of solutions to all constraints Pr xe r iyn[AiM l (x) = 1] = P^xe u {O s i} n [ x £ U] • Pr z ^£>[AjM*(x) = 1]. 

The theorem now follows since we can compute p in time [n/5)°^ k2 ^ using Claim 2.3, as I? is a 
small-space source of width at most 0(n 3k ) and M has width at most (n/5) 0<yk ' . □ 

4.2 Proof of Theorem 4.2 

We start with some notation. Let D denote the small space generator of width at most S. For 
^4 C {0, l} n we use D(A) to denote the measure of A under D. Let U , . . . , U n be the vertices in 
D with U l being the i'th layer of D. For a vertex u G U l , let D" be the distribution over {0, l} n ~ l 
induced by taking a random walk in D starting from u. Given a vertex v 6 L(M, i) and u G [/*, let 
Pm,u{v) denote the probability of accepting if we start from v and make transitions in M according 
to a suffix sampled from distribution D u . 

As we did for knapsack, we start from the exact branching program M and construct a sequence 
of programs M n = M, . . . ,M°, where M l is obtained from M %+1 by rounding the (i + l) st layer. 
We do the rounding in such a way that the acceptance probabilities are well approximated under 
each of the possible distributions on suffixes D u . The program M° will be a small width program. 

Let L(M i+1 ,i + 1) = {v\ -< v 2 ■ ■ ■ -< v w }. Fix a vertex u G U i+1 . We define a set B i+1 (u) = 
{ v u(j)} != L{M l+l ,i + 1) of breakpoints for u as follows. We start with v u (v\ = vw and given v u u\ 
define v u{j+1) by 

V u (j+1) = maxv s - t - v -< v u(j) and < P Mi+\u( v ) < P Mi+^,u( v u{j))/i} + e ) 

Let B l+1 = U u( z U i+iB l+1 (u) = {b\ -<■■■-< bj^} be the union of breakpoints for all u. We set 
L(M l ,i + 1) = B t+l . The vertices in all other layers stay the same as in M* +1 , as do all the edges 
except those from layer i to i + 1. We round these edges upward as follows: let v G L(M l+1 , i) and 
M i+1 (v,b) = v' G L(M i+1 ,i + 1). Find two consecutive vertices b k ,b k+ i G L(M\i + 1) such that 
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bk -< v' ■< bk+i- We set M l (v,b) = bk+i- Note that this only increases the number of accepting 
suffixes for v. 

This completes the construction of the M l s. We now analyze the running time of our algorithm. 
We start with the following claims whose proofs are similar to that of Lemmas 3.1, 3.3 and are 
omitted. 

Lemma 4.4. The branching program M l is monotone where the ordering of vertices in each layer 
is the same as M . 

Lemma 4.5. For v £ M l , we have A M i+i(v) C A M i(v). Thus Pm,u(v) < Pm\u{ v ) f or a ^ u e U l . 

We next analyze the complexity of constructing M° from M. 

Lemma 4.6. The branching program can be constructed in time 0(n 2 S(S+log(W)) \og{nS / e) / e) . 

Proof. Observe that for every i and u £ U % , \B % {u)\ < and hence \B l \ < "^r~- Let us analyze the 
complexity of constructing M l from M t+l . We will assume inductively that the set B l+2 is known 
and stored in a binary tree along with the values P M i+i tU (b), for every b £ B %+2 and u £ U %+2 . 
Hence, given v £ L(M,i + 1), we can find bk,bk+i £ B t+2 such that bk < v ^ bk+i in time 
log(n5/e). This ensures that if we are given a vertex v' £ L(M l+1 ,i + 1) and u £ U t+1 , we can 
compute P M i+i u (v') in time log(nS/e). To see this, note that 

iVn>')= E Vu(z)P MW ^{M^\v\z)) 
ze{o,i} 

where u z £ U %+2 denotes the vertex reached in D when taking the edge labeled z from u. To 
compute M l+1 (v',z) we first compute v = M(v',z) using the fact that M is described implicitly. 
We then find bk -< v ^ bk+i in B l+2 and set M l+1 (v',2:) = bk+i- Since we have the values of 
P M i+i u (b) precomputed, we can use them to compute Pm^ 1 u ( v ')- The time required is dominated 
by the 0(log(nS/e)) time needed to find bk+i- 

Now, for each u £ by using binary search on the set of vertices as in Lemma 3.2, each 

new breakpoint in B l+l (u) can be found in time 0(log(W) log(nS/e)). Thus finding the set B i+1 
takes time 0(nS log(W) \og(nS / e) / e) . 

Once we find the set B 1+1 , we store it as a binary tree. We compute and store the values of 
P Mi u (b) = P M i+\ u (b) for each b £ B i+1 and u £ U i+1 in time 0{nS 2 \og{nS / e) / e) . 

Thus overall, the time required to construct M° from M is 0(n 2 S{S+\og(W)) log(nS/e) /e). □ 

We next show that the number of accepting solutions does not increase by too much. 

Lemma 4.7. For v £ L(AP,j) and u £ , we have Pm\u( v ) — Pm,u(v)(1 + e) n ~ l ■ 

Proof. It suffices to show that Pm\u{ v ) — ^ , M i + 1 u( u )(l + £ )- This claim is trivial for j > % + 1 
since for such vertices, P M i u {v) = Pj^i+i u ( v )- As in Lemma 3.4, the crux of the argument is when 
j = i. Since M l+l and M % are identical up to layer i, the claim for j < i will follow. 

Fix v £ L(M' l ,i) and u £ U l . Let ito,"Ui denote the neighbors of u in D. Then we have 

P M >,u(v)=Pu(0)P M >,u (M l (v,0)) +p u (l)P M ^ Ul (M\v,l)). (4.2) 

We first bound P^^M^v, 0)). Let h, 64 be the breakpoints in B i+1 (u ) such that 61 -< M i+1 (v, 0) ^ 
64 and let b 2 , 63 be the breakpoints in B i+1 such that b 2 -< M i+1 (v, 0) ^ b 3 . Note that M^v, 0) = 63, 



11 



by the construction of M\ Since B i+1 (u ) C B i+1 , we get h ^ b 2 -< M i+1 (v,0) H 6 3 H 64. By the 
definition of breakpoints, we have 

^W, no (6 4 ) < (l + e)^ + i, no (M J+1 (i;,0)) 
and by the monotonicity of M !+1 

which together show that 

JW.uofaO < (l + ^iWW^M))- 
Since 63 € L(M l ,i + 1), we have Pm\u ^P^) = ^M'+^uo 

(63). Thus 

P M W M 'M)) ^ (1 + e)P A / 1 +i jU0 (M i+1 (ti, 0)). 

Similarly, we can show 

P Ml>ul (M\v,l)) < (l + e)P Mi+ltU1 (M i+1 (v,l)). 
Plugging these into Equation 4.2 gives 

Pap, u {v) < (l + e)( Pu (0)P Ml+ ^ uo (M l+ \v,0))+ Pu (l)P Ml+Kui (M i+1 (v,l))) = (I + e)P Mi+ i >u (v) 

which is what we set out to prove. □ 

We can now prove Theorem 4.2. 

Proof. Choose e = Q(6/n) so that (1 + e) n < (1 + 5). We construct the program M° from M and 
output Pm°, u { s ) where s is the start state of M and u is the start state of S. By Lemma 4.6, this 
takes time 0(n s S(S + log(W)) \og(nS/5)/S). Applying Lemmas 4.7 and 4.5, we conclude that 

PmM ^ p M°, u (s) < PmMO- + 5 )- 

Note that Pm,u(s) and Pm°,u( s ) are respectively the probabilities that M and M° accept a string 
sampled from the distribution D. This completes the proof. □ 

5 Counting for General Integer Knapsack and Contingency Tables 

Our algorithms for counting also extend to general integer knapsack and contingency tables. Con- 
ceptually the algorithms are similar to those for {0, l}-knapsack and multidimensional knapsack. 
However, the details are a little intricate involving a combination of our ideas and Dyer's ideas. We 
defer the proofs to the appendix. 

6 Learning Functions of Halfspaces via ROBPs 

We now present our learning algorithm and prove Theorem 1.5. We start with some notation. 
A halfspace h : {0, l} n — > {0, 1} is a Boolean function defined by f(x) = 1 if ^ aiXi < b and 
otherwise, where a € R n and b G R. Let / : {0, 1}" — > {0, 1} and let \i{ denote the uniform 
distribution over {0, 1}\ For each prefix x € {0, l} 1 , we define the function f x : {0, l} n_ * — > {0, 1} 
by f x (z) = f(x o z) where o denotes concatenation. For two functions /, g : {0, l} n — > {0,1}, 
we define d(f,g) = Pr x ^^ n [f(x) / g(x)]. Thus given two prefixes x,y G {0,1}*, d(f X) f y ) = 
Pr 2 ^„_J/(xoz) / f(yoz)]. 
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Definition 6.1 (Almost ROBPs). We call a function f : {0, l} n -> {0, 1} a (e, W, n) -almost ROBP 
if there exist sets S C {0, 1}', / € {1, . . . , n} with \S \ < W such that for every y G {0, 1}' i/iere 
existe an x £ S l such that d(f x , f y ) < e. We call sets S l , . . . , S l (e, W) -representatives. 

It is interesting to contrast the notion of an almost-ROBP (aROBP for short) with having a 
good approximation by an ROBP under the uniform distribution. It is easy to show that there 
exists a function / which is <5-close to a width 2 ROBP, but which is not an (e, W, n)-aROBP for 
WjE" 1 = poly(ra, 5" 1 ), by corrupting the parity function randomly on some 5 fraction of inputs. 
In the other direction, it is not obvious that an (e, W, n)-aROBP can be well-approximated by a 
small width ROBP under the uniform distribution. But this is in fact true, and the proof is via 
our learning algorithm, which we present below. 

The algorithm learns an aROBP /, given query access to /, by constructing a ROBP M that 
approximates /. The ROBP M has n layers numbered through n. The set of vertices in layer i is 
denoted by L(M,i). Each vertex x G L(M,i) corresponds to a string x G {0, 1}*. L(M, 0) consist 
of a single start state, identified with the null string <p. By abuse of notation, we will think of M 
both as a branching program and a Boolean function. 

Main Algorithm. Input n, e, W. 

Let L(M, 0) contain the null string, while L(M, i) are empty sets for i G {1, . . . , n}. 
For i = 1, . . . , n: 

For each x G L(M, i - 1) and b G {0, 1}, 

Check if there is y G L(M, i) such that d(f xo b, f y ) < 3s. 
If so, add an edge labeled b from x to y. 

If not, add x o b to L(M, i), add an edge labeled b from x to x o b. 
If \L(M,i)\ > W, then output FAIL and halt. 

In line 4 of our algorithm, to check if there is a vertex y that is e-close to x o b, we pick L 
random suffixes z G {0, l} n ~* and check if f(x o b o z) = f{y o z). By the Chernoff bound, if 
L = 0(log(nW 2 /5)/e), then the probability that our estimate of d(f xo b,fy) is off by more than 
an additive e is at most 5/2nW 2 . Since each layer has at most W vertices in total, we estimate 
at most 2nW 2 such quantities. Hence the probability that the error is more than e in any of our 
estimates is at most 5. 

Theorem 6.2. Fore,5 > 0, given oracle access to a (e,W,n)- almost ROBP f, the above algorithm 
runs in time 0{nW \og{nW / 5) / e) and constructs a (W,n)-ROBP M such that d(M,f) < 4ne with 
probability at least 1 — 5. 

We assume that all our estimates are within e, which happens with probability 1 — 5. The 
theorem follows from two claims. 

Claim 6.3. The algorithm never outputs FAIL. 

Proof. Let S , . . . , S n be (e, VF)-representatives for /. For each x G S l , consider the balls B(x) = 
{y G {0,1}* : d(f y ,f x ) < e} for any x G S l . By definition, they cover all of {0, l} n ~\ We claim 
that L(M, i) cannot have two distinct vertices y, y' G {0, l} 1 in layer i that belong to the same ball 
B{x). For, if y,y' lie in the same ball, d(f y ,f y /) < 2e. Since the sampling error is at most e, our 
estimate for d(f y , f y r) would be at most 3e, thus we would not add both of them to L(M, i). Hence 
\L(M,i) < < W. □ 

Claim 6.4. We have Pr^[M(x) / f(x)] < 4ne. 
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Proof. By induction on n — i, we will show that for every x G L(M, i) , d(M x , f x ) < 4(n — z)e. This 
implies that when % = 0, <i(M, /) < 4ne as desired. 

For i = n there is nothing to prove. Suppose the statement is true for all vertices in L(M, i + 1). 
Consider a vertex x G L(M,i). Let yo>2/l G L(M,i + 1) be it's neighbors in M. Then, by our 
assumption on sampling errors, for 6 G {0, 1}, d(f xo t,,fy b ) < 4e. By the induction hypothesis, we 
know that d(f Vb , M yb ) < 4(n — i — l)e. Putting these together, we get 

d{h,M x )= l - d(f xob ,M yb ) 
fce{o,i} 

< \ Y ( d (fx°b, fy b ) + d(f Vb , M Vb )) (by triangle inequality) 

6G{0,1} 

< 4e + 4(n - i — l)e = 4e(n - i). 

□ 

Theorem 6.2 now follows as the probability of sampling error is at most 5. Our main learning 
result for halfspaces, Theorem 1.5 follows by combining Theorem 6.2 and the following easy claims. 
The first claim is implicit in [MZ10] who prove a stronger result about sandwiching halfspaces 
between ROBPs. We present a more direct proof below. 

Claim 6.5. Every halfspace is an (e,l/e,n)-almost ROBP. 

Proof. Fix a halfspace / = l{^i a i x i — Fix i < n. We show that there exist representatives 
S l , \S' l \ < 1/e for prefixes of length i. Let g(v) = Pr^Ej<i a i x i — v \- Observe that g(v) is a 
non-decreasing function of v. Now, starting from v\ = we inductively define = min-y > Vj 
such that g(v) > g(vj) + e. This gives at most k < 1/e values Vj. Now for each j, we choose xj to 

be some x G {0, 1}' such that X^Ki aiXl = v r ^ * s eas y ^° see ^ na * ^ = ■ • • > ^J" f° rms a se * 
of representatives for prefixes of length i. □ 

Claim 6.6. Let f l ,...,f k : {0,1}™ -> {0,1} be (e,W,n)-aROBPs and g : {0, l} k -> {0,1}. T/ien 
/i : {0,l} n -> {0,1} denned 6y fc(s) = g(f(x), . . . , f k {x)) is an (2ke,W k ,n)-aROBP. 

Proof. For j < k, let Sj,...,SJ be (e, W)-representatives for /j. Fix i < n and form a set of 
prefixes T l C {0, 1}* as follows: for every xi G S\, . . . , x^ G S\ if ?7(xi, . . . , x^) = {z G {0, l} 1 : 
d(f Xj , z) < e, Vj < fc} is not empty, add a single element of £/ to T\ 

By construction, |T*| < W k . Further, for every y G {0,1} 1 , there exist x\ G S\,...,Xk G S^, 
such that y G f7(xi, . . . , x^). Let u be the element of U (xi, . . . , x&) added to T. Then, by a union 
bound, d(h y , h u ) < J2j d(fi, ft) < 2ke. □ 

We observe that combing the above arguments with those of Theorem 4.2, we get similar results 
for learning under any explicitly given small-space source. In particular we can learn functions of 
halfspaces under p-biased and symmetric distributions. 
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A A FPTAS for General Integer Knapsack 



In this section, we prove Theorem 1.3. As in the case of {0, l}-knapsack we start with the exact 
branching program M for KNAP(a, b, u), where each state in L(M,j) corresponds to a partial sum 
Vj = Ylii<j a i x i an d has (uj + i + 1) outgoing edges corresponding to the possible values of variable 
Xj + i. We then approximate this program with a small width branching program. However the 
program M can both large width and large degree. To handle this, we observe that the branching 
program M is an interval ROBP in the sense defined below, which allows us to shrink the state 
space, and obtain succinct descriptions of the edges of the new branching program we construct. 

Definition A.l (Interval ROBPs). For u = (u\ , . . . , u n ) G 7/L, S,T G Z+, an (S,u,T) -interval 
ROBP M is a layered multi-graph with a layer for each < i < T , at most S states in each 
layer. The first layer has a single (start) vertex, each vertex in the last layer is labeled accepting 
or rejecting and there exists a total order -< on the vertices of layer i for < i < T. A vertex v 
in layer i — 1 has exactly m + 1 edges labeled {0, 1, . . . ,m} that respect the ordering -<: If M(v,k) 
denotes the k'th neighbor of v for k < Ui, then M(v,Ui) ^ M(v,Ui — 1) ^ • • • ^ M(v,0). 

An interval ROBP defines a function M : [0, u±] x [0, 112] x • • • x [0, u n ] — > {0, 1} where on input 
x, we begin at the start vertex and output the label of the final vertex reached when traversing M 
according to x. 

Given an (S, u, T)-interval ROBP M, and a vertex v G L(M,i — 1), the edges out of v can 
be described succinctly by a subset of at most S edges irrespective of how large U{ is. If we set 
E(v, w) = {0 < k < ui : M (v, k) = w} be the set of edges from v to w, then E(y, w) is an interval, 
meaning E(v, w) = {l v>w , . . . , r v>w } for some l v>w , r v>w £ Z+. Thus, to describe E(v , w) we only need 
to know l VM and r v ^ w . This allows us to succinctly describe the interval ROBP M° approximating 
KNAP(a, b, u) by storing just the end points of the edge sets E(v,w) for v,w £ M°. 

Let M denote the exact branching program for ^2 i<n aixi < b with edges between layers i — 1 
and i labeled by X{ G {0, . . . , u{\ and for v S L(M, i — 1), < xi < Ui we have M(v , xi) = v + aixi E 
L(M, i). Given a vertex v G L(M,j) we use Pm(v) to denote the probability that M(v, z) accepts, 
for z chosen randomly form {0, . . . , Uj + \} x • • • x {0, . . . , u n }. It is clear from the definition that M 
is an interval ROBP with the ordering on L(M, i) given by u -< v if u > v. 

We construct a series of interval ROBPs M n = M, M n_1 , . . . , M°. We obtain M i from M i+1 
by rounding the states in L{M l+l ,i + 1). More precisely, we set L{M l ,i + 1) = \y\, . . . ,vf\ C 
L(M i+1 ,i + 1) where the VjS are defined as follows: Let v\ = 0. Given Vj, let 

Vj + \ = minu such that v > Vj and < P M i+i(v) < P M i+i(vj)/(l + rf). (A.l) 

Let Ii = {v\, . . . , V2 — 1}, • • • , It = {v£, . . .}, where £ < n(log U)/rj as P M i(v\) < 1 and P M i(ve) > 
U~ n . Next we redirect the transitions going from level i to level i + 1. If we have an edge labeled 
z G {0, . . . entering a vertex v G Ij, then we redirect the edge to vertex vj. The redirection 

will be done implicitly in the sense that for any vertex v in level i and a vertex Vj , we only compute 
and store the end points of the interval E(v , Vj) = {0 < k < Ui + \ : M l (v, k) = Vj}. 

Our branching programs have the following approximating properties analogous to Lemmas 3.1, 
3.3, 3.4. The proofs are similar and are omitted. 

Lemma A.2. For any v G L(M i ,j) and < k < I < u j+1 , M l (v,k) < M\v,l). Let v,v' G 
L(M\j) and v < v' . For any suffix z, M l (v,z) < M l (v',z). 

Lemma A. 3. For v G M l , we have A M i+i(v) C A M i(v). Further, for any v G L(AP,j) where 
j < i, we have P M (v) < P M i(v) < P M (v)(l + r,) 71 ' 1 . 
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We next show that M° can be constructed efficiently. 

Lemma A. 4. Each vertex Vj G L(M l ,i + 1) can be computed in time 0(n(log C/)(log W)/rj). 

Proof. The proof is by induction: we maintain the invariant that for every i, we know the vertices 
Vj of L(M l ,i + 1) and their acceptance probabilities P M i(). 

Suppose we have the above setup for / > i and have computed vi,...,Vj E L(M l ,i + 1). 
Recall that Vj+i is the smallest value of v > vj satisfying P M i+i(v) < P M i+i(vj)/(l + rf). We next 
show that for a given v G L(M t+1 ,i + 1), P Ml +i(v) can be computed in time 0{ni\ogU) /if). Let 
L(M l+l ,i + 2) = {w\ < W2 < ■ ■ ■} then, E(v, w{) = {0 < k < Uj + 2 : < v + aj+2& < and 

P M i+i(w)= > —P Ml +i{w). 

u;£L(Af I +\i+2) J+Z 

Thus, we can compute Pm^ 1 ^) m time 0(n(logU)/rj) as |L(M 4+1 ,i + 2)| < n(logu)/r). 

We can now do binary search on Pj^i+i (v). Since we start with integers in the range {0, . . . , W}, 
this will require 0(log(W)) computations of P M i+i{v). Once we have computed L(M l ,i + 1) we 
store these vertices and their probabilities of acceptance. □ 

Thus we can construct M° from M in time O (n 3 (log U) 2 (log W)/r) 2 ). 

We can now finish the proof of our counting result for general integer knapsack. 

Proof of Theorem 1.3. We set rj = 5/2n and use the above arguments to construct the branching 
program M° and compute the value of P M o(s) where s is the start state. We now apply Lemma A. 3 
to conclude that 

Pm(s) < P M o(s) < P M (s)(l + r/) n < (1 + S)Pm(s) 

where the last inequality holds for small enough 5. Finally, note that the number of knapsack 
solutions is precisely Pm(s) Y\i(ui + 1)- Hence we output Pmo(s) Y\i(ui + 1). □ 

B A Deterministic Algorithm for Counting Contingency Tables 

We now address the question of counting contingency tables. Our algorithm is fairly intricate and 
involves a combination of Dyer's FPRAS for counting contingency tables and our algorithms for 
counting general integer knapsack solutions and counting knapsack solutions under small space 
sources. Here is a high-level outline of the algorithm: 

• We first give an algorithm for counting integer knapsack solutions under "interval small- 
space sources" which are integer-valued distributions that generalize small-space sources in 
the same vein as interval ROBPs of Definition A.l generalize ROBPs. However, we specialize 
our analysis to the specific case of contingency tables for clarity. 



We then observe that Dyer's approach for counting contingency tables (implicitly) gives an 
explicit "interval small-space source" D whose support contains all feasible contingency tables 
and the set of feasible contingency tables has non-negligible density under D. We then 
combine the above two observations as in the proof of Theorem 1.2. 
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We first set up some notation. Following Dyer, we will solve the following formulation of 
counting contingency tables. Given r = (ri,...,r m ) G Z™, c = (c\,...,c n ) G Z™, estimate 
\CT'(r, c)|, where 

n m 

CT'{r, c) = {X G Z™ xn : ]T ^ < r u i € [m], = ty, j G [n]}. 

This is equivalent to the original problem as stated in the introduction, since for r G Z™ and 
d = (ci,...,c„+i) G Z™ +1 , \CT{r,c')\ = \CT'(r,c)\, where c = (ci,...,c n ). For X G Z™ xn , let X i 
denote the i'th row and Xj denote the j'th column of X. Let R = max(rj, : i G [m],j G [n]). For 
j G [n], let J3(j) ={iGZ™:^ ! 3; i = Cj } and let i? = {X : Xj G T(j), j G [n]}. Further, define 
h : Zip -»■ {0, . . . , 2n 2 } m = T by ^(x) = (h(x))i = [2n 2 Xi/n\ and let 

n 

S = {IeB:^ < 2n 2 l}. 

j'=i 

We now state a sequence of lemmas that we need for our proof. The first is due to Dyer [Dye03] . 
At a high level, it lets us use the uniform distribution over S in the role of a small-space source. 

Lemma B.l (Dyer). CT'(r,c) C S and \S\ < n m \CT'(r, c)| . Further, we can estimate \S\ deter- 
ministically in time n ^ m \ 

Given this lemma, it suffices to additively approximate the number of valid contingency tables 
under S, which we do by constructing suitable ROBPs. The next lemma gives us explicit small- 
width interval ROBPs Mi that approximate the i'th row constraint under the uniform distribution 
over S. 

Lemma B. 2. For every i G [m], we can in time n°( m ^ (log 3 R)/r] 2 compute an (n ^ m \logR)/r],c,n)- 
interval ROBP Mi explicitly such that for every x G Z™ , ^ . xj < ri implies Mi(x) = 1, and 

Pr [Mi(X<) = 1] < (1 + vT Pr < r*]. (B.l) 

XGuO XG u b 

3 

Next we show how to efficiently compute the probability of all the MjS accepting simultaneously 
under S. 

Lemma B.3. We can in time (n° '( m )(log R)/r)) m compute Prxe u s[ A™ x Mi(X l ) = 1]. 
We first prove Theorem 1.4 assuming these lemmas. 

Proof of Theorem 1.4- Set rj = e/mn m+1 in Lemma B.2 to obtain interval ROBPs Mi, . . . ,M m 
satisfying Equation B.l. Then, X G S implies Mi{X % ) = 1 for i G [m] and by a union bound, 

Pr [A^MiiX') ? 1{X G CT'(r,c)}] < * 
On the other hand, by Lemma B.l, 

Pr [X G CT'(r,c)] > — . 

Combining the above two equations we get that 

Pr [X G CT'{r,c)\ < Pr [A^M^X 4 ) = 1 ] < (1 + e) Pr [X G CT'(r,c)]. 

Thus, p = \S\ Prx£ u s[ A£L l Mi{X l ) = 1] is a e-relative error approximation for \CT'(r, c)| = 
\S\ Prx£ u s[X G CT'(r,c)]. The theorem now follows as by Lemmas B.l, B.3, we can compute 
p deterministically in time (n°( m )(log R)/e) m . □ 
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B.l Proof of Lemma B.2 



We show how to construct Mi; the constructions of M2, ■ ■ ■ ,M m are similar. As in Section A we 
start with an interval ROBP M that exactly computes the function M{x) = x j — r i} anc ^ 

compute a sequence of interval ROBPs M n = M < M n ~ l < • • • < M°, where M l is obtained 
by rounding M i+1 . The final ROBP M° will have width at most n°( m )(log R)/n. Throughout 
this section, without explicitly saying so, we shall assume that all interval ROBPs are stored and 
computed succinctly as was done in Section A. 

We now describe how to get M % from M l+1 . For u G T, and Z G [n], let D(u, I) be the distribution 

of (Xi + i, . . . , X n ) G l ' for X £ u S conditioned on X^fc<zM^fc) = u - Further, let D\(u,l) 

denote the distribution of the first row of Y for Y <— D(u, I). 
For a vertex v in layer I of M 3 , j < n and u G T, let 

PjVfi+i u (v) = Pr [starting from v, x leads to an accepting state in M t+1 ]. 
' «-fli(ti,i) 

Let L(M i+1 ,i + 1) = {«!-< «2 -<••• }. Fix u G T and define a set = {v u{j) } Q L(M i+1 , i + 1) 
of breakpoints for u as follows. Start with v u ^ = vr and given v u (j) define v u ^ + i^ by 

v u(j+i) = niaxf such that v ~< v u ^ and < PM i + 1 ,u( v ) < Pm^ 1 ,u( v u(j)) I (1 + (B-2) 

We set L(M l ,i + 1) = L) u( ztB(u) to be the union of the breakpoints for all u. Let L(M l ,i + 1) = 
{61 -<•••-< &at}. Note that N < (2n 2 ) m (nlog R)/n. The vertices in all other layers stay the same 
as in M l+1 , as do all the edges except those from layer i to i + 1. We round these edge upward 
as before: let v G L(M i+1 ,i) and for an edge label b, M i+1 (v,b) = v' G L(M i+1 ,i + 1). Find two 
consecutive vertices bk,bk+i G L(M l ,i + 1) such that bk < v' bk+i- We set M l (v,b) = bk+i- 
Since the analysis of M° is similar to the analysis of Lemmas 4.5, 4.7 and A. 3, we only analyze the 
complexity of constructing M° which is a little trickier. We need the following preliminary lemmas 
which are implicit in Dyer's FPRAS for contingency tables. 

Lemma B.4 (Implicit in Dyer). For j G [n], and intervals Ii,...,I m C [0, R], we can estimate 
Pr s/e u T(j)[A^ =1 G (y k G I k )] in time 0(m2 m ). 

Proof. Follows from an argument similar that of Lemma 4 in Dyer. □ 

Lemma B.5 (Implicit in Dyer). For u,z G T and I G [n], (Xi + \, . . . ,X n ) <— D(u,l), we can 
estimate Pr[/i(X/ +1 ) = z] in time 0((2n) 4m+1 ). 

Proof. For t G T, let Sj(t) = \{x G T(j) : h(x) = t}\ and 

n 

f(k,t) = \{(y k+1 ,...,y n )eT(k + l)x---xT(n) : %<) < 2n 2 l - t } \. 

i=k+l 

Then, 6j(t) can be computed in time 0(m2 rn ) by the above lemma. Further, f(n,t) = 5 n (t) and 
for k < n,s G T, 

/(M = 2> fc+ i(i)/(fc + l,t + a). 

Therefore, we can compute f(k,t) for all k G [n],i G T in time n 4m+1 (m2 m ) = 0((2n) 4m+1 ). The 
lemma now follows as 

. f{l + l,u + z) 



Pr[h{X l+1 ) 



□ 
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We next show how to compute the transition probabilities P^i u (v) efficiently. 

Lemma B. 6. For i G [n],u£T and we can compute L(M % + andP M i u (v)forv£L(M' l ,i + l) 
in time n°^ m ' (log R) 3 /rj. 

Proof. The proof is by induction on n — i. For i = n there's nothing to show and suppose that for 
each vertex w G L(M l ,i + 1) and u G T, we know the values of Pj^i u (w). Fix a v G L(M l ,i), and 

letX = (X i+1 ,...,X n )^£>(M- 

Now, from the definition of D(u,i), given h{Xi + \) = z, (Xj + 2> • • • ,X n ) is independent of 
and moreover, the distribution of pQ+2, • • • , X n ) is precisely D(u + z, i + 2). Therefore, 

Jm*,» = H G A = Z \ ■ P M^(u+z)H, (B.3) 

weL(Af*,i+l) 26T 

where E(v,w) = {y G Z + : M l (v,y) = w}. Now, by the construction of M l , |L(M*,i + 1)| < 
(2n 2 ) m (n log R)/r] = n°( m \log R)/n and by the induction hypothesis, we know the values of 
Pm\(u+z)( w ) f° r an w £ L(M l ,i + 1). Thus to it is enough to show that we can compute 
Pr x [xj + i G E{v,w) A hi(xi + \) = z] efficiently for every w. 

Fix a w = Wj G L(M l ,i + 1). Then, E(v,Wj) = {y\ G Z + : Wj-\ < v + y\ < Wj} is 
an interval whose boundaries we know and let l\ = E(v,Wj) n {y : = z±}. Then, for 

Q = Pr[X 1(i+1) G E(v,Wj)Ah(X i+1 ) = z], 

q = Pr[h(X i+1 ) = z] ■ Pr[X 1(m) G E(v,wj) \ h(X i+1 ) = z] 

= ?r[h(X l+l ) = z)- Pr [ yi eE(v, Wj )\h(y) = z] 
ye u T{i+i) 

(as conditioned on h{Xi + \) = z, Xi + \ is independent of (Xi + 2, . . . ,X n )) (B.4) 
Prj, eilT ( i+ i)[yi G E(v,Wj) Ah(y) = z] 



Pr[h(X i+1 ) 



Pr y£uT(i+l)[h(y) 



z 



= pr[/,(x i+l) = ,] ■ "wytw s A Ag u frfa) = 

Now, as {y fc : h(y fc ) = z fc } = {y fc G Z + : r k z k /(2n 2 ) < y k < r fc (z fc + l)/(2n 2 )}, it follows from 
Lemma B.4 that we can compute the second term above in time n°^ m \ Further, we can compute 
the first term efficiently by Lemma B.5. Thus, q can be computed in time n ^ m \ 

Thus, for each v G L(M l , i),u G T, we can compute P M i u (v) in time n°( m )(log R) /rj and hence 
for a fixed u G T, each new breakpoint in B(u) can be found in time n°( m ) (log R) 2 /n using binary 
search as in Lemma 3.2. Further, as \L(M i ~ 1 ,i)\ < n° (m) (log R)/rj and we can compute L{M % ,i) 
in time n°( m \log R) 3 /n 2 as claimed. □ 

Lemma B.2 now follows from the above lemma and a straightforward extension of the arguments 
of Lemmas 4.4, 4.5, 4.7. We omit these details. 

B.2 Proof of Lemma B.3 

The proof is similar to that of Lemma B.6: we show by induction that for i G [n], v k G L(M k , i), 1 < 
k < m and u G T and X <— D(u, i), we can compute 

P(v\, . . . , v m , u) = Probability X leads to an accepting when starting from v k in M k , Vfc G [m]. 
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For i = n, there's nothing to show and suppose the statement is true for j > i + 1. Fix 
v\ G L(M k , l),...,v m G L(M k ,i) and u £ T. For w fc G L(M k ,i + 1) = let E k (v k ,w k ) = {y G 
Z + : M k (v k ,y) = w k }. Then, similar to Equation B.3, for X = . . . , X n ) <— D(u,i), 

P(yi, ...,v m ,u)= ^ ^2 Pr[A fc (X fc(i+1 ) G E(v k ,w k )) A = z]-P(wi, . . . , w m , u+z). 

(wi,...,w m ):w k eL k zGT 

(B.6) 

We next show that for fixed w\ G la,... , w m G L m ,z G T, we can compute q = Pr[A k (X k ^ i+1 ^ G 
£(tffc,m fc )) A h(X i+1 ) = z]m time n°W. Let I k = {y : y G E(w fc ,u; fc )} n {y : h k (y) = z k }. Then, 
similar to Equation B.4, we have 



q = Pr[h(X i+1 ) = z] ■ Pr[A fc (X k{i+1) G E{v k ,w k )) \ h{X i+1 

Pr [ Afc (y k G E(v k ,i 
ye u T(i+l) 

Pr ^e u T(i+i)[ A fc iVk G 4)] 



Pr[/i(X i+ i) =z]- Pr [A t (ae%,^))|%) = z- 
!/e„T(i+i) 



Pr[/i(X i+ i) = z] 



Pr 2/GuT( J+ i)[%) 



Combining the above equation with Lemmas B.4, B.5, we can compute q in time n ^ m \ Therefore, 
by Equation B.6, we can compute P(v±, . . . , v m , u) in time (n°^ m '(log R)/rj) m . Lemma B.3 now 
follows by induction. 
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